Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays
نویسندگان
چکیده
منابع مشابه
Parallel Suffix Arrays for Linguistic Pattern Search
The paper presents the results of an analysis of the merits and problems of using suffix arrays as an index data structure for annotated natural-language corpora. It shows how multiple suffix arrays can be combined to represent layers of annotation, and how this enables matches for complex linguistic patterns to be identified in the corpus quickly and, for a large subclass of patterns, with gre...
متن کاملDistributed text search using suffix arrays
Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, wh...
متن کاملMetric Suffix Array For Large-Scale Similarity Search
We propose the Metric Suffix Array (MSA), as a novel and efficient data structure for permutation-based indexing. The Metric Suffix Array follows the same principles as the suffix array. The suffix array is mainly used for text indexing. Here, we build the MSA as an alternative for large-scale content based information retrieval. We also show how the MSA is scalable for parallel and distributed...
متن کاملSuffix Arrays on Words
Surprisingly enough, it is not yet known how to build directly a suffix array that indexes just the k positions at word-boundaries of a text T [1, n], taking O(n) time and O(k) space in addition to T . We propose a class-note solution to this problem that achieves such optimal time and space bounds. Word-based versions of indexes achieving the same time/space bounds were already known for suffi...
متن کاملSuffix Trees and Suffix Arrays
Iowa State University 1.1 Basic Definitions and Properties . . . . . . . . . . . . . . . . . . . . 1-1 1.2 Linear Time Construction Algorithms . . . . . . . . . . . . . 1-4 Suffix Trees vs. Suffix Arrays • Linear Time Construction of Suffix Trees • Linear Time Construction of Suffix Arrays • Space Issues 1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2014
ISSN: 1041-4347
DOI: 10.1109/tkde.2013.129